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Motivated by multi-objective optimization, we study extrema of a set of A*' points independently 
distributed inside the d-dimensional hypercube. A point in this set is fc-dominated by another point 
when at least k of its coordinates are larger, and is a fc-minimum if it is not fc-dominated by any other 
point. We obtain statistical properties of these partial minima using exact probabilistic methods 
and heuristic scaling techniques. The average number of partial minima. A, decays algebraically 
with the total number of points, A ~ ^ when 1 < fc < d. Interestingly, there are k — 1 

distinct scaling laws characterizing the largest coordinates as the distribution P{yj) of the j'th largest 
coordinate, yj, decays algebraically, P{yj) ~ with a-, =if5j for 1 < j < ^ ~ 1- The 

average number of partial minima grows logarithmically, A ~ (d-i)'. (In A'")'^"'", when k = d. The full 
distribution of the number of minima is obtained in closed form in two-dimensions. 

PACS numbers: 02.50.Cw, 05.40.-a, 89.20.Ff, 89. 75. Da 



A host of decisions in computer science, economics, 
politics, and everydaylife involve multiple criteria or mul- 
tiple objectives [H, [1,11, Q ■ A pedestrian choosing a walk- 
ing path considers the distance, the number of turns, and 
the number of traffic lights. In business, takeover bids are 
decided on a multitude of complex conditions in addition 
to the total monetary offer. In elections, voters examine 
how candidates stand on multiple issues. 

In multi-objective optimization, a solution that is op- 
timal with respect to all criteria is rarely possible and 
instead, one faces a set of choices that are suboptimal on 
most criteria. Decisions require algorithms to weed out 
inferior choices, sort through all the remaining imperfect 
choices, and evaluate their overall quality. 

Motivated by multi-criteria decision problems, we 
study the statistics of multi-variate imperfect minima. 
We consider a set of N points in d-dimcnsions, with co- 
ordinates X = [xi,X2, ■ ■ ■ ,Xd)- Each coordinate Xi > 
is a distinct cost and by convention, small-x values are 
superior and are considered dominant. 

Partial minima, analog to imperfect choices, are de- 
fined as follows. A point x is said to be k-dominated 
by x' when at least k of the coordinates of x are larger 
then the corresponding coordinates of x'. A point is 
said to be a partial minimum, or formally a k-minimum, 
when it is not fc-dominated by any other point in the 
set. We stress that a partial minimum is not required 
to dominate all other points on the same d — k coordi- 
nates and may dominate different points along different 
coordinates. The parameter 1 < k < d quantifies the 
quality of the partial minimum: a smaller k value repre- 
sents a more stringent condition. The two extremes are 
the perfect minimum, fc = 1, where every coordinate is 
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a minimum of the point set, and the efficient set, k — d, 
that includes all points that are not obviously dominated 
by other points as shown later in figure 1. Partial minima 
are conditional multivariate extrema and their properties 
are amenable to analysis using a statistical physics per- 
spective [i,[i,0,[i,[ii- 

In this study, we obtain exact statistical properties of 
partial minima including the multivariate density and its 
asymptotic behavior as well as scaling properties such 
as the typical size and average number of minima. We 
present two major results. First, as a fimction of the 
set size N, the average number of minima decays alge- 
braically when 1 < k < d, and grows logarithmically 
when k = d. Second, there are k — \ different scaling 
laws for the largest coordinates, each following a power- 
law distribution with /c— 1 distinct exponents. The rest of 
the d~\-\ — k coordinates are characterized by distributions 
with sharp tails. We also discuss the relevance of these 
results to the multi-objective shortest path on graphs, a 
central problem in multi-objective optimization. 

We consider the situation where there are no correla- 
tions between the coordinates. That is, each coordinate 
is independently drawn from some distribution. As dis- 
cussed below, this situation is equivalent to a uniform dis- 
tribution in the unit hypercube. Thus, we conveniently 
assume that xt is uniformly distributed in [0 : 1] for all 
l<i<d. 

Heuristic Arguments. Elementary scaling laws for the 
typical size of a partial minimum and the average num- 
ber of minima are derived heuristically. We assume that 
(i) the partial minimum is dominant on a fixed set of k 
coordinates, and (ii) all its coordinates are equal, Xi — x, 
for all i. By the partial minimum definition, the cor- 
responding fc-dimcnsional hypercube contains only the 
partial minimum itself. The volume of this hypercube is 
a;* and the expected number of points inside this hyper- 
cube must be of order one, Nx'^ ~ 1. Consequently, the 
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typical size x decays algebraically with N , 

Xr^N-T^. (1) 

This characteristic scale decreases as the minimum con- 
dition becomes more stringent, that is, as k decreases. 
The expected number of partial minima 



(2) 



follows from the expected number of points inside the 
d-dimensional hypercube with linear dimension x, Nx'^ . 
Partial minima are asymptotically rare and the scale ([T]) 
decays indefinitely. Furthermore, with a small prob- 
ability, there is only one minimum when N is large. 
The scaling estimate ^ coincides with the exact value 
A = 7V~('^~i) for fc = 1, since any point is a perfect min- 
imum with probability N~'^ . For k = d, the minimum in 
any one coordinate is a partial minimum and thus, there 
is at least one partial minimum. Indeed, the decay expo- 
nent in (21) vanishes. This special case is discussed 
separately. 

The Density of Minima. The density Pc;,fe(x) of k- 
minima located at x is obtained analytically through a 
formal generalization of the heuristic argument above. 
For example, in two dimensions the density is 



N [1 — {xi + X2 — XiX2)]^ ^ k=l, 

N[l-xiX2]'^~^ k = 2. 



P2,k{xi,X2) 



The factor N is the number of ways to choose the min- 
imum, and the second factor guarantees that the rest 
of the points do not dominate the minimum at (xi,X2). 
These points must not fall inside an L-shaped region of 
area xi + X2 — xiX2 or equivalently 1 — (1 — a;i)(l — X2) 
when fc = 1 or a rectangle of area X1X2 when k — 2. 
In general, the density of minima 



Pd,fc(x) = N[l - Gd,fc(x)] 



N-l 



(3) 



reflects that the — 1 points are excluded from a d- 
dimensional region of volume G'd.fe(x). The excluded vol- 
ume obeys the recursion 

Gd^fc(x) = XdGd-i,k-ii'x.) + [1 - Xd)Gd^i,ki^)- (4) 

In our notation, the dimensional index of a function dic- 
tates the dimension of its vectorial argument so the vec- 
tors on the right hand side of (|4]) have d—1 components. 
We obtain the recursion relation ^ by separating the 
excluded region into two regions: one in which the dth 
coordinate is dominant and one in which it is not. Using 
the boundary conditions Gd.o = 1 and Gd.k = when 
fc > d, we recover Gi^i — xi and 6*2,1 — xi + X2 — xiX2- 
Furthermore, 

2;i+a:;2+-'J;2~a;iX2 — a:i2;3 — a;2a;3+a;iX2a;3 fc = 1, 
G3,k = { xiX2-f xiX34-2;22;3-2a;ia;2a;3 fc = 2, 

X1X2X3 fc = 3. 



In general, Gd.d = I\i=i and Gd,i = 1 - ni=i(l " ^i)- 
Scaling. In the limit N — > 00, the product term X1X2 
in ^2,1 = ^[1 — {xi + X2 — xiX2)]^^^ is negligible com- 
pared with the linear term xi + X2 and thus, 

P2,i(a;i,a;2)-> A^e-^("i+"^). 

Generally, only the fcth degree terms are asymptotically 
relevant and the leading behavior is 
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The auxiliary function _fd,fc(x) contains (J^) terms, each 
a distinct product of degree fc. For example, 

Xi + X2 + X3 fc = 1, 

Fs^k ={ X1X2 + X1X3 + X2X3 fc = 2, 
X1X2X3 k — 3. 

The auxiliary function equals the sum, Fd^i = X^iLi^^i, 

and the product, Fd^d = IliLi^^i: in the two extremes. 
The function Fd,fe(x) is defined recursively 



i^d,fc(x) = XdFd^i^k^iix.) + Fd_i,fc(x) 



(6) 



for I < k < d with the boundary condition fo,fc = '^fc.o- 
This recursion follows from ^ by dropping the higher- 
degree term Xd Gd-i,k{^)- 

The asymptotic behavior ([5]) can be recast in the scal- 
ing form 
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as — > 00. The scaling variable is z = xN^^'^, in accord 
with ll]), and the scaling function is 



$d,fc(z) = e 



(8) 



The average number of fc-minima equals the in- 
tegral of the density, Ad.k = f dxPd.ki^), where 
/ dx = Jl^^^ dxi When k < d, the asymptotic 

behavior of the average follows from the scaling form JT]), 



Ad,k ^ ad,k N k 



(9) 



and is in agreement with The proportionality con- 
stant ad,k equals the integral of the scaling function, 
fld.fe = / dz <I'cj_fc(z), although now, the integration range 
is unrestricted, j di, = Y\'l^i dzi. The prefactor is 
trivial for perfect minima, ad.i = 1, and otherwise, it 
can be obtained analytically only in a few exceptional 
cases including for example 03,2 = j"''^^^- 
Extreme Statistics. By definition, partial minima may 
be dominant on certain coordinates but inferior on oth- 
ers. We therefore study extremal statistics [III HI, [HI to 
investigate possible disparities between the coordinates. 

First, consider the largest coordinate. With- 
out loss of generality, we order the coordinates 
xi < X2 < ■ ■ ■ < Xd-i < Xd- Our focus is on the tail of 
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the distribution of the variable Xd, corresponding to the 
regime Xd 3> Xd-i- We also restrict our attention to the 
limit iV — > oo. The distribution Qi{xd) of the largest 
coordinate Xd equals the integral of the multivariate dis- 
tribution with respect to the rest of the coordinates, 



Qi{xd) = I dxi- ■ ■ J dxd-iPd,k{xi,X2,- ■ ■ ,Xd) 
dxi--- I dxd^iNe-'^^^-"'^''^ 



dxi ' 



dxd-iNe-^'^'^'-'-"-'^'''' 



'{xdT 



(10) 



The second line is obtained by substituting the leading 
asymptotic behavior ([5]) and the third line reflects that 
only the first term in ([6]) is relevant when Xd 3> Xi for all 
i < d. Our last step is to multiply and divide the third 
line by Xd and then invoke the scaling law ([9]) for the 
average number of A: — 1-minima in d — 1 dimensions. In 
essence, we utilize the fact that when one of the coordi- 
nates is very large, the partial minima criterion involves 
one less constraint in one less dimension [l^ . The power- 
law decay of the distribution (fTU]) shows that there is a 
substantial likelihood that Xd is relatively large. 

The distribution Q2{xd~i) of the second largest coor- 
dinate a:^-! is obtained using the bivariate distribution 
Q{xd-i,Xd), 



Q{xd-i,Xd) = I dxi--- J dxd^2Pd,k{xi,X2,- ■■ ,Xd) 
dxf- I dxd-2Ne-^^-'-'''^'^^ 



dxi--- J dxd-2 iVe-^^''-i^'^-P''^-i.'-i(^) 

d — k , d — k 1 

^ N-^ixd-iXd)-^-\ (11) 

The distribution Q2ixd-i) equals the integral of the bi- 
variate distribution with respect to the largest coordi- 
nate, Q2{xd-i) = C dxdQ{xd-i,Xd). This integral is 
dominated by the divergence at the lower limit of inte- 
gration, and consequently 



Q2ixd-l) N-T^{xd-iy^T^' 



(12) 



The power-law tail is now steeper. 

A similar calculation applies to the distributions of 
the fc — 1 largest elements. In general, the distribution 
Qjijjj) of the jth largest element, y^, with the definition 
Uj = Xd+i-j, decays as a power-law. 



(13) 



for 1 < i < k — 1. The decay exponent increases mono- 
tonically with the index j, 



d-k 



J- 



■J 



(14) 



We can verify the decay law ^ using 
A ^ flf-i/k dyj QjiUj) where the lower limit of integra- 
tion is set by the typical size scale ([T]). Interestingly, 
there are fc — 1 distinct scaling behaviors for the fc — 1 
largest elements. Each of these extremal coordinates is 
distributed according to a power-law distribution that is 
characterized by a distinct exponent. 

This multiscaling behavior affects the behavior of the 
moments (y™) defined as follows, (y™) — Im/Io, where 

= jli^^/kdy^yf Qj{yj). The integral is domi- 
nated by the divergence at the lower cutoff when the 
order is small, m < aj, but otherwise, the integral 
is finite. Consequently, the moments have the following 
scaling dependences on N 



j{d-k) 
^N^W^ In AT 

ild-k-) 
_/Y k(k-j) 



m < a," 



a," 



m > a. 
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Low order moments exhibit ordinary scaling behavior as 
they are characterized by the typical size scale ([T]) that 
underlies the multivariate distribution function ([5]). As 
usual, there is a logarithmic correction at the crossover. 
High-order moments plateau at a fixed value that is in- 
dependent of the index m, an indication that there is a 
significant probability that the extreme elements are of 
order one. Interestingly, the average size of the differ- 
ent coordinates may follow different scaling laws. For 
example, there are two scaling laws, (yi) ~ TV-^/^ and 
(2/2) ~ iV^^/'^ when d — 4: and fc = 3. Of course, the sum 
X^iLi the same extremal statistics as does Xd- 

The crossover moment or equivalently the exponent 
aj diverges as k j. Therefore, the smallest + 1 — fc 
coordinates exhibit the ordinary scaling behavior 



(16) 



for k < j < d and all moments of the respective distri- 
bution functions must be finite. In these cases, the dis- 
tribution functions Qj have tails that are as sharp as or 
sharper than an exponential. In the aforementioned case 
d = 4 and fc = 3, the third and the fourth largest coordi- 
nates exhibit the ordinary scaling, (j/3) ^ (2/4) ~ N^^^^. 
Efficient Sets. The set of points that are not dominated 
on all coordinates by any other point are partial minima 
when k = d (figure 1). We refer to this set as the "efficient 
set" . The efficient set, also termed the efficient frontier or 
Pareto equilibria, plays a central role in multi-objective 
optimization and has been studied in economics, com- 
puter science, operations research, and game theory be- 
cause every point in the set is a candidate solution to the 
multi-objective optimization problern, depending on the 
relative weights of the various costs [ll, [la] ■ 

In the special case k = d, the expected size of the 
efficient set, Ed{N) = Ad^d{N), obeys the recursion 
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EdiN) = EdiN - 1) + ^ Ed-iiN). 



(17) 
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FIG. 1: Illustration of the efficient set in two-dimensions. 
Filled squares are on the efficient set and unfilled squares are 
not. Only four of the filled squares are on the convex hull. 



The point with the largest Xd coordinate certainly does 
not dominate any other point. Furthermore, this point 
is on the efficient set if and only if the rest of its d — 1 
coordinates are not dominated by any other point. This 
event occurs with probability j^Ed-iiN) and hence, the 
second term in the recursion. We note that the recursion 
P7p can also be obtained by performing the integration 
over Xd in Ed{N) ^ N J dx[l - xiX2 ■ ■ ■ Xd]^~^ ■ This 
integration is analytically feasible only if k — 1 or k — d. 

The recursion relation (fT7)) is subject to the boundary 
condition Ei{N) = 1. In two dimensions, 



E2iN) = 1 



1 

TV' 



(18) 



or alternatively, E2{N) = H{N), where H{N) = J2ii=i n 
is the harmonic number. The average size of the effi- 
cient set grows logarithmically, E2 {N) =\n.N + ^ + ■ ■ ■ 
where 7 = 0.57721 is Euler's constant. In three dimen- 
sions, we have E^^N) =J2n=i n-^(^)' asymptoti- 
cally, E3{N) ~ i(lniV)2. The large- iV behavior is ob- 
tained in general by converting the difference equation 
(fTT)) into a differential equation dEd/dN = Ed-i/N. The 
expected size of the efficient set grows logarithmically, 



Ed{N) 



id-1) 



(19) 



This logarithmic growth reflects that the integral of the 
scaling function, J dz^d,d{^)i is divergent at the upper 
limit. A straightforward generalization of the calculation 
above shows that the distribution of the extremal coor- 
dinates has a logarithmic correction. 



(20) 



for 1 < j < d — 1. We can verify that the average 
number of points is consistent with the exact behavior 
J]^-i/d dy Qj{yj) ^ (InA^)"*"^ as in p^ . The crossover 
moment vanishes and the moments decay logarithmically. 



{yp ^ {InNy^, 
where to > and 1 < j < d — 1 . 



(21) 



Two-dimensions. In two-dimensions, the full proba- 
bility distribution function pn{N) that the efficient set 
includes n points, where 1 < n < A^, satisfies the recur- 
sion [l3| 

Pn{N) = (l-iV-i)p„(iV-l)+iV-V-i(A^-l) (22) 

and is subject to the boundary condition Pri(O) = Sn,o- 
On the square, there are two coordinates: Xi and X2- 
Following the reasoning behind p7|) . the point with the 
largest X2 coordinate is on the efficient set if and only if 
its xi coordinate is minimal, an event that occurs with 
probability N~^. 

for the average E{N) = (n) 
V{N) = {v?) - with 
are obtained by summing 
. The average satisfies E{N) = E{N - 1) + N-^ 
in accord with (fT7|) and the variance satisfies 
V{N) ^V{N + N-^ - N-"^. Thus, the vari- 
ance equals the difference between the first and the 
second harmonic numbers 



Recursion 
and the 



equations 
variance 



(/(")> ^Eti/H^- 



V{N) = H{N) - H'^^\n) 



(23) 



where H^^^N) = The variance and the 

average have identical leading asymptotic behaviors, 
ViN) = InTV-h (7- i7r2) + ■■■. 

With the transformation p„ (N) — ■^'Pn (N) , the aux- 
iliary function pn{N) satisfies the recursion 



Pn{N) = {N- l)pn{N - l)+Pn{N - 1) 



(24) 



with Pn(0) = (5n,o- This recursion defines the Stirling 
numbers [^] [l|' so p„(iV) = [^] . Therefore, the fuU 
probability distribution is expressed in closed form. 



Pn{N) = 



(25) 



for < n < TV. 

The general asymptotic behavior, derived in [l£ 



Pn{N) 



1 



1 



[In NY 
NT{n/ In TV) ~ 



(26) 



applies in the limit n — > oo A^ — s- cx) with the ratio n/ InN 
finite. For small n <C In A^, the distribution is Poissonian, 
P„{N) = A^-i(ln Af)"-V(n- 1)! and for large n, the dis- 
tribution approaches a Gaussian centered at the average 
E{N) ~\nN with the variance V^(A^) ~ InA^, 



Pn{N) 



1 



V27r In N 



exp 



[n ~ InN) 
21nA^ 



21 



(27) 



We note that the convex hull, a subset of the efficient 
set (see figure 1), is characterized by similar statistical 
properties including a limiting Gaussian distribution and 
logarithmic growths, albeit with different prefactors, of 
the average and the variance [1^ [U, |23| ■ 
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Multi-Objective Shortest Path. The multi-objective 
shortest path on a graph is defined as follows. Consider 
a graph, possibly with multiple edges connecting pairs of 
nodes, with d different costs on each edge. Fix the source 
and the destination nodes, and then consider all possible 
paths from source to destination, assigning a vector of 
d total costs to each path. The multi-objective shortest 
path problem requires identification of the efficient set of 
paths. Generally, finding the efficient set is an NP-hard 
problem, although efficient approximation schemes exist 
[23l [23 |. The computation time of the approximation 
scheme depends crucially on the size of the efficient set. 

Suppose the weights are assigned independently at ran- 
dom to the edges. We can consider two limiting topolo- 
gies. First, for a graph of two nodes connected by N 
edges, the efficient set grows only polylogarithmically in 
the number of edges following the calculation above. Sec- 
ond, for a one-dimensional chain of nodes where each pair 
of neighboring nodes is connected by a pair of edges, 
the weights of the paths become correlated We 
have conducted numerical studies, and found that the 
size of the efficient set is highly sensitive to the distri- 
bution of weights on the edges. Assuming each edge has 
two weights, {wi, W2), both chosen from some continuous 
distribution, the convex hull grows linearly in the length 
of the chain. Interestingly, we observed various behaviors 
for the size of the efficient set, ranging from linear in the 
length of the chain, to power law behavior with various 
exponents greater than unity, up to stretched exponential 
behavior. 

Finally, consider an Erdos-Renyi random graph of M 
nodes (25l . [26| . Two randomly chosen nodes will typi- 
cally have a shortest path distance between them of order 
log M using a metric which simply counts the number of 
edges traversed. While it is possible for a path on the 
efficient set to be longer than this, because the weights 
are positive, we expect that paths on the efficient set will 
be at most of order -\/log M longer. The total number 
of paths of at most that length grows exponentially in 
Vfog M , and such paths will tend to overlap only near 
the source and destination nodes. Thus, to a good ap- 
proximation we expect that the weights of the paths will 
be uncorrelated, enabling us to use the results above. 
Then, the number of paths on the efficient set will only 



be of order (logM)''/^. In general, then, when the paths 
have little overlap as here, the number of paths on the effi- 
cient set is much smaller than in cases like one-dimension 
where the paths greatly overlap. 

Conclusions. We studied statistical properties of par- 
tial minima in a set of uncorrelated points in general 
dimensions. These partial minima are defined by a pa- 
rameter k: a point is a partial minimum if it dominates 
all other points on at least d—k coordinates. As this con- 
dition becomes more stringent, partial minima improve 
in quality but are less probable. Remarkably, there is a 
series of distinct power-law distributions that character- 
ize the largest coordinates with a consequent multiscaling 
distribution of the moments, while the rest of the coordi- 
nates obey ordinary scaling. In the extreme case k = d, 
the number of partial minima grows logarithmically with 
the total number of points. 

Our results hold as long as the set of points are not 
correlated, that is, as long as they are drawn from inde- 
pendent distributions. These distributions need not be 
identical. If the zth coordinate is drawn from the dis- 
tribution fi{xi), the transformations Xi J^' dyifi{yi) 
and dxi — > fi{xi)dxi, maps to a uniform distribution in 
the unit hypercube. Correlations present an interesting 
challenge and we anticipate serious modifications to the 
scaling laws above. For instance, it is simple to show that 
the size of the efficient set grows as a power of the num- 
ber of points, ~ N^^^, rather than a logarithm, when the 
points are uniformly distributed inside the unit circle. In- 
cidentally, this growth is much faster than the iV^/^ for 
the corresponding number of points in the convex hull 

Another interesting issue is the crossover from the al- 
gebraic decay © to the logarithmic growth The 
average number of partial minima decreases monotoni- 
cally with N when k is small, but is a non-monotonic 
function of N when k is large. For example, when d = 4 
and fc = 3, the average A^.k peaks at = 16. It will be 
interesting to elucidate how the height and the location 
of this peak scales with A^. 
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